Section: New Results
Mixture models
Taking into account the curse of dimensionality
Participant : Stéphane Girard.
Joint work with: Bouveyron, C. (Université Paris 1), Celeux, G. (Select, INRIA).
In the PhD work of Charles Bouveyron (co-advised by Cordelia Schmid from the INRIA LEAR team) [53] , we propose new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:
-
the introduction in the model of a dimension reduction constraint for each group
-
the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters
This modelling yields a new supervised classification method called High Dimensional Discriminant Analysis (HDDA) [4] . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named High Dimensional Data Clustering (HDDC) [3] .
In collaboration with Gilles Celeux and Charles Bouveyron, we have designed an automatic selection of the discrete parameters of the model [12] . Also, the description of the R package is submitted for publication [44] .
A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: Application to robust clustering
Participants : Florence Forbes, Darren Wraith.
We proposed a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from the eigenvalue decomposition of the covariance matrix in the traditional Gaussian scale mixture representation. By contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examined a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and tails. For these latter cases, we provided maximum likelihood estimation of the parameters and illustrated their modelling flexibility on clustering examples for several simulated and real data sets.